Multilayer Document Compression Algorithm

نویسندگان

  • Hui Cheng
  • Charles A. Bouman
چکیده

In this paper, we propose a multilayer document compression algorithm. This algorithm first segments a scanned document image into different classes such as text, images and background, then compresses each class using an algorithm specifically designed for that class. Two algorithms are investigated for segmenting documents: a general purpose image segmentation algorithm called the trainable sequential MAP (TSMAP) algorithm, and a ratedistortion optimized segmentation (RDOS) algorithm. Experimental results show that the multilayer compression algorithm can achieve a much lower bit rate than most conventional algorithms such as JPEG at similar subjective distortion levels. We also find that the RDOS method produces more robust segmentations than TSMAP by eliminating misclassifications which can sometimes cause severe artifacts.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document compression using rate-distortion optimized segmentation

Effective document compression algorithms require that scanned document images be first segmented into regions such as text, pictures, and background. In this paper, we present a multilayer compression algorithm for document images. This compression algorithm first segments a scanned document image into different classes, then compresses each class using an algorithm specifically designed for t...

متن کامل

Document Image Segmentation and Compression

Cheng, Hui, Ph.D., Purdue University, August, 1999. Document Image Segmentation and Compression. Major Professor: Charles A. Bouman. In the first part of this research, we propose an image segmentation algorithm called the trainable sequential MAP (TSMAP) algorithm. The TSMAP algorithm is based on a multiscale Bayesian approach. It has a novel multiscale context model which can capture complex ...

متن کامل

Efficient Conversion of Digital Documents to Multilayer Raster Formats

How can we turn the description of a digital (i.e. electronically produced) document into something efficient for multilayer raster formats [1, 6, 4]? It is first shown that a foreground/background segmentation without overlapping foreground components can be more efficient for viewing or printing. Then, a new algorithm that prevents overlaps between foreground components while optimizing both ...

متن کامل

Optimizing block-thresholding segmentation for multilayer compression of compound images

Compound document images contain graphic or textual content along with pictures. They are a very common form of documents, found in magazines, brochures, Web sites, etc. We focus our attention on the mixed raster content (MRC) multilayer approach for compound image compression. We study block thresholding as a means to segment an image for MRC. An attempt is made to optimize the block threshold...

متن کامل

Pre- and postprocessing for multilayer compression of scanned documents

The mixed raster content (MRC) document-compression standard (ITU T.44) specifies a multilayer representation of a document image. The model is very efficient for representing sharp text and graphics over a background. However, its binary selection layer compromises the representation of scanned data and soft edges. Typical segmentation algorithms that split up the document into layers tend to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999